Minitron-8B-Base is a large language model obtained by pruning Nemotron-4 15B, employing distillation and continuous training methods, saving 40 times the training tokens and 1.8 times the computational cost compared to training from scratch.
Large Language Model
Transformers English